<html>
<head><meta charset="utf-8"><title>Casting `extern &quot;Rust&quot;` function references · general · Zulip Chat Archive</title></head>
<h2>Stream: <a href="https://rust-lang.github.io/zulip_archive/stream/122651-general/index.html">general</a></h2>
<h3>Topic: <a href="https://rust-lang.github.io/zulip_archive/stream/122651-general/topic/Casting.20.60extern.20.22Rust.22.60.20function.20references.html">Casting `extern &quot;Rust&quot;` function references</a></h3>

<hr>

<base href="https://rust-lang.zulipchat.com">

<head><link href="https://rust-lang.github.io/zulip_archive/style.css" rel="stylesheet"></head>

<a name="229266455"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/122651-general/topic/Casting%20%60extern%20%22Rust%22%60%20function%20references/near/229266455" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Goat <a href="https://rust-lang.github.io/zulip_archive/stream/122651-general/topic/Casting.20.60extern.20.22Rust.22.60.20function.20references.html#229266455">(Mar 08 2021 at 08:57)</a>:</h4>
<p>Casting a pointer to a function reference in Rust is impossible (for <a href="https://stackoverflow.com/questions/36645660/why-cant-i-cast-a-function-pointer-to-void">a valid reason</a>), but transmuting between different function reference types is always technically possible, since different function reference types always have the same size.</p>
<p>However, it's unclear whether it's <em>sound</em> to perform such a transmute. Consider the following:</p>
<div class="codehilite" data-code-language="Rust"><pre><span></span><code><span class="k">use</span><span class="w"> </span><span class="n">core</span>::<span class="n">mem</span><span class="p">;</span><span class="w"></span>
<span class="k">fn</span> <span class="nf">takes_reference</span><span class="p">(</span><span class="n">x</span>: <span class="kp">&amp;</span><span class="kt">i32</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w">    </span><span class="fm">println!</span><span class="p">(</span><span class="s">"They gave me a reference to {}!"</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="p">);</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
<span class="k">fn</span> <span class="nf">takes_pointer</span><span class="p">(</span><span class="n">_x</span>: <span class="o">*</span><span class="k">const</span><span class="w"> </span><span class="kt">i32</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w">    </span><span class="fm">println!</span><span class="p">(</span><span class="s">"They gave me a pointer..?"</span><span class="p">);</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
<span class="k">fn</span> <span class="nf">takes_function</span><span class="p">(</span><span class="n">f</span>: <span class="nc">for</span><span class="o">&lt;'</span><span class="na">a</span><span class="o">&gt;</span><span class="w"> </span><span class="k">fn</span><span class="p">(</span><span class="o">&amp;'</span><span class="na">a</span><span class="w"> </span><span class="kt">i32</span><span class="p">))</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w">    </span><span class="n">f</span><span class="p">(</span><span class="o">&amp;</span><span class="mi">87</span><span class="p">);</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
<span class="k">fn</span> <span class="nf">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w">    </span><span class="n">takes_function</span><span class="p">(</span><span class="n">takes_reference</span><span class="p">);</span><span class="w"> </span><span class="c1">// Valid, safe</span>
<span class="w">    </span><span class="kd">let</span><span class="w"> </span><span class="n">takes_pointer_as_takes_reference</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">unsafe</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w">        </span><span class="c1">// The requirement of validity of the passed reference is weakened, so</span>
<span class="w">        </span><span class="c1">// it's seemingly safe, or is it?</span>
<span class="w">        </span><span class="n">mem</span>::<span class="n">transmute</span><span class="p">(</span><span class="n">takes_pointer</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="k">fn</span><span class="p">(</span><span class="o">*</span><span class="k">const</span><span class="w"> </span><span class="kt">i32</span><span class="p">))</span><span class="w"></span>
<span class="w">    </span><span class="p">};</span><span class="w"></span>
<span class="w">    </span><span class="n">takes_function</span><span class="p">(</span><span class="n">takes_pointer_as_takes_reference</span><span class="p">);</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>Is the transmute safe there? I'm more inclined to say "no" than "yes", since the <code>extern "Rust"</code> ABI is technically entirely undefined and thus different function reference types are entirely unrelated. However, <a href="https://play.rust-lang.org/?version=stable&amp;mode=debug&amp;edition=2018&amp;gist=6e357b3999c6a51da46550457870f6d1">running this in the playground</a> reveals that the compiler does not currently exploit that and considers two functions to have the same ABI if all their arguments and their return values have the same ABI (<code>#[repr(transparent)]</code> compatibility, which references and pointers have). Is this intentional and guaranteed to be safe, or not?</p>



<a name="229268070"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/122651-general/topic/Casting%20%60extern%20%22Rust%22%60%20function%20references/near/229268070" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Lokathor <a href="https://rust-lang.github.io/zulip_archive/stream/122651-general/topic/Casting.20.60extern.20.22Rust.22.60.20function.20references.html#229268070">(Mar 08 2021 at 09:11)</a>:</h4>
<p>I think this is a "happens to work" situation.</p>



<a name="229277087"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/122651-general/topic/Casting%20%60extern%20%22Rust%22%60%20function%20references/near/229277087" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Konrad Borowski <a href="https://rust-lang.github.io/zulip_archive/stream/122651-general/topic/Casting.20.60extern.20.22Rust.22.60.20function.20references.html#229277087">(Mar 08 2021 at 10:31)</a>:</h4>
<p>This is not something the language guarantees, while it's very unlikely, the ABI could change in such a way that those could end up becoming incompatible.</p>



<a name="229355235"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/122651-general/topic/Casting%20%60extern%20%22Rust%22%60%20function%20references/near/229355235" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Goat <a href="https://rust-lang.github.io/zulip_archive/stream/122651-general/topic/Casting.20.60extern.20.22Rust.22.60.20function.20references.html#229355235">(Mar 08 2021 at 19:01)</a>:</h4>
<p>I probably should clarify why I'm asking this. I'm the maintainer of the <a href="https://crates.io/crates/thin_trait_object"><code>thin_trait_object</code></a> macro, which is an alternative implementation of trait objects which is stored as a thin pointer, hence the name. This layout makes it possible to pass them over the FFI boundary (implement traits and call trait object methods from C) and improves cache efficiency for <code>Vec</code>s of boxed trait objects.</p>
<p>The current design is that the vtable contains function references with the receiver (<code>&amp;self</code>/<code>&amp;mut self</code>) replaced with a <code>*mut ::core::ffi::c_void</code>, which points to the beginning of the trait object (the first field is the pointer to the vtable, everything after that is the value itself). Upon creation of a trait object from a concrete type, the vtable is filled in with thunks which perform pointer arithmetic, converting the <code>*mut c_void</code> (pointer to the vtable pointer) to a reference to the contained value, then calling the concrete implementation (at the assembly level, the address of the concrete implementation is hardcoded into the thunk assembly). <strong>I call this approach the "double hop mode"</strong>, because virtual dispatch is performed in two "hops", from TTO to thunk and from thunk to implementation.</p>
<p>A more efficient approach can be taken. Instead of the boxed trait object calling the thunks to adjust the receiver and call the implementation, the implementations can be stored in the vtable directly, alongside with the size and alignment of the contained type. The boxed trait object's <code>Foo</code> implementation can perform the pointer arithmetic itself and then call the implementation directly, reinterpreting the calculated pointer to the value as a reference to the value. <strong>This is single hop mode</strong>, because virtual dispatch is performed in one hop, from TTO to implementation, no thunks involved.</p>
<p>I don't feel comfortable with implementing single hop mode to be technically unsound but safe in practice. That goes against Rust's core safety principles, and I don't want <code>thin_trait_object</code> to become unreliable in that way. Missing out on the optimization either incurs one additional function call if the implementation isn't inlined into the thunk or executable bloat if it is. A safe opion would be to only allow single hop mode for well-specified ABIs, but then code would have to use <code>extern "C"</code> for pure Rust code, missing out on other optimizations <code>extern "Rust"</code> may deliver by not being entirely well-defined.</p>
<p>The middle ground would be to leave <code>extern "Rust"</code> undefined in general, but define one case of ABI compatibility, when all arguments and the return value are also ABI-compatible (<code>#[repr(transparent)]</code>-like relationship, as it is with pointers and references). That would require an RFC, but I'm still uncertain and would like to hear some feedback before I proceed to write it.</p>



<a name="229592958"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/122651-general/topic/Casting%20%60extern%20%22Rust%22%60%20function%20references/near/229592958" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Daniel Henry-Mantilla <a href="https://rust-lang.github.io/zulip_archive/stream/122651-general/topic/Casting.20.60extern.20.22Rust.22.60.20function.20references.html#229592958">(Mar 10 2021 at 02:39)</a>:</h4>
<p>I think that all the transmuted types are within the same "<code>#[repr(transparent)]</code> class", then it ought to be fine; that's the very <em>raison d'être</em> of <code>#[repr(transparent)]</code>. That is, <code>#[repr(transparent)]</code> not affecting the ABI seems like a stronger argument than <code>#[repr(Rust)]</code> not having any ABI guarantee whatsoever.<br>
The other question is whether Rust references are in the same "<code>#[repr(transparent)]</code> class" as raw pointers <span aria-label="wink" class="emoji emoji-1f609" role="img" title="wink">:wink:</span>, and that, I don't know.</p>



<a name="229594349"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/122651-general/topic/Casting%20%60extern%20%22Rust%22%60%20function%20references/near/229594349" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> isHavvy <a href="https://rust-lang.github.io/zulip_archive/stream/122651-general/topic/Casting.20.60extern.20.22Rust.22.60.20function.20references.html#229594349">(Mar 10 2021 at 02:57)</a>:</h4>
<p>Raw pointers and references are guaranteed to have the same layout. There are semantic differences in what you can do with the data the point to though.</p>



<a name="229594561"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/122651-general/topic/Casting%20%60extern%20%22Rust%22%60%20function%20references/near/229594561" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> isHavvy <a href="https://rust-lang.github.io/zulip_archive/stream/122651-general/topic/Casting.20.60extern.20.22Rust.22.60.20function.20references.html#229594561">(Mar 10 2021 at 03:00)</a>:</h4>
<p>I'm generally against trying to specify anything about <code>#[repr(Rust)]</code> personally, but other things have already been defined (like the size of <code>Option</code>)</p>



<a name="229606675"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/122651-general/topic/Casting%20%60extern%20%22Rust%22%60%20function%20references/near/229606675" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Thom Chiovoloni <a href="https://rust-lang.github.io/zulip_archive/stream/122651-general/topic/Casting.20.60extern.20.22Rust.22.60.20function.20references.html#229606675">(Mar 10 2021 at 05:45)</a>:</h4>
<blockquote>
<p>Casting a pointer to a function reference in Rust is impossible (for a valid reason), but transmuting between different function reference types is always technically possible, since different function reference types always have the same size.</p>
</blockquote>
<p>So, first of all, POSIX and Windows both guarantee that casting between functions pointers and at least <code>void*</code> is fine. Rust also allows <code>some_fn as usize</code>, and so I've for the most part assumed this was deliberately supported (but I guess we also allow <code>some_fn as u32</code> and just, why...). IMO Rust should take the step and guarantee this for function pointers, for the same reason it promises any of the stuff it promises about the system (8 bit bytes, for example): it maps with 99% of programmers expect, and supporting the weird edge-case hardware only lead to more bugs.</p>
<p>Anyway, more generally, the reason cited on stack overflow (ia64) is inaccurate, although it is a common misconception — IA64 could have, but explicitly <em>avoided</em> having a different function/data pointer representation because it would have broken too much — including the ability to support posix/windows. It seems highly likely that future platforms like it will do the same trick, which should always work.</p>
<hr>
<p>Anyway, keep reading if you want to know how stuff this works on itanium, since I guess this is where I type too much about a thing, and try to clear up a common misconception fruitlessly. I've actually programmed for IA64, and on both unix and windows, it's function pointers are the same size as its data pointers — 64 bits.</p>
<p>However, it <em>is</em> true that on Itaniu,m a single pointer alone is insufficient to indirectly call a function. You need both the address of the first instruction of the function, and the value that a specific register should have when accessing a function (gp, the "global pointer", used to access globals). Together, these get packaged into a 2-pointer struct known as a function descriptor that looks like this:</p>
<div class="codehilite" data-code-language="Rust"><pre><span></span><code><span class="c1">// Note: Despite what is commonly believed, this is not what a function pointer is on ia64!</span>
<span class="cp">#[repr(C)]</span><span class="w"></span>
<span class="k">struct</span> <span class="nc">Ia64FuncDescriptor</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w">    </span><span class="c1">// really this is more like NonNull&lt;c_void&gt;, but for clarity</span>
<span class="w">    </span><span class="c1">// about what the *real physical representation is*, I'm using u64</span>
<span class="w">    </span><span class="n">address_of_first_instruction</span>: <span class="kt">u64</span><span class="p">,</span><span class="w"></span>
<span class="w">    </span><span class="n">value_of_gp</span>: <span class="kt">u64</span><span class="p">,</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>So, this is 128 bits, and is the information needed to call an indirect function, so naively we would need to have function pointers and data pointers have different sizes... Thankfully, this is <strong>not</strong> a function pointer is on IA64, it's what a function pointer points to.</p>
<p>That is, <code>extern "C" fn()</code> is just a <code>NonNull&lt;Ia64FuncDescriptor&gt;</code>. Or something similarly shaped anyway. This means that calling a function pointer is slightly more work than on most arches (in particular, it requires reading through wherever the descriptor is located), but it's a tradeoff. In exchange, this is directly a data pointer, and thus is obviously represented identically other data pointers, and so there's no issue.</p>
<p>So, while this has to be <code>extern "C" fn</code>-on-itanium (for existing OSes anyway), perhaps <code>extern "Rust" fn</code>-on-itanium would decide to just be use the whole 128 bit function descriptor inline, to avoid that indirection. Nothing prevents this from working, however it would be a nontrivial amount of work, I suspect.</p>
<p>Note that I think that the general technique here (where they turn a "fat" function pointer into "thin" data pointer by pointing to the data) is probably what future platforms that need extra data for indirect calls would do too.</p>
<p>Anyway, since I'm speaking the truly unthinkable (that information on stack overflow is wrong), here is a citation from the IA64 ABI documentation for unix: <a href="https://www.csee.umbc.edu/portal/help/architecture/24537001.pdf">https://www.csee.umbc.edu/portal/help/architecture/24537001.pdf</a></p>
<blockquote>
<p>On the IA-64 architecture, when one function calls another it is the caller's responsibility to reset the global pointer (gp) to the correct value for the object containing the called function.<br>
Thus, to call a function a caller needs two pieces of information:  the address of the function and the value its global pointer should have. These two pieces of information are contained in a structure known as a function descriptor (see Conventions). <br>
So that a function pointer may be passed from function to function and still retain enough information to enable the function to be called, a function pointer is defined to be a pointer to the function descriptor for that function.</p>
</blockquote>
<hr>
<p>Now, are there any ABIs where the sizes of function and data pointers differ? Yes. I can think of at least one that actually mattered in the past (and certainly if you go far enough back its the wild west — there were machines with 35 bit words, so I assume that everything terrible is possible). But, for stuff people have heard of: On old segmented x86 this could <em>sort of</em> happen... It would depend on compiler settings.</p>
<p>These generally let you picked whether or not to use segmentation for code and data pointers separately. For example, under Turbo C, selecting the "medium" memory model would have the compiler use "far pointers" (32 bits) for code, but near pointers (16 bits) for data.</p>
<p>That said, even then you could select compiler options that would let you have uniform function and data pointer representation. So Rust could still support this platform, just less efficiently (in practice there's a lot of language features we'd need to support this platform efficiently anyway, so probably lets not sweat that one). Anyway, I wouldn't worry about it that much — C only barely supported it too.</p>



<a name="229645494"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/122651-general/topic/Casting%20%60extern%20%22Rust%22%60%20function%20references/near/229645494" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Thom Chiovoloni <a href="https://rust-lang.github.io/zulip_archive/stream/122651-general/topic/Casting.20.60extern.20.22Rust.22.60.20function.20references.html#229645494">(Mar 10 2021 at 12:08)</a>:</h4>
<p>(Oh, this was a bit much for general — i thought this was UCG or something, sorry)</p>



<hr><p>Last updated: Aug 07 2021 at 22:04 UTC</p>
</html>